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O ■ i INTRODUCTION 

b> ! 

. !-h ' Theoretical arguments suggest that the first astrophysical objects 
, ignited at redshifts z < 30-40. The ionizing radiation of these 
5^ ' early stars and black holes carved out bubbles of ionized hydrogen 
around them. As subsequent generations of objects formed, these 
HII regions gradually merged to create a complex HII morphology. 
This epoch of reionization terminated in so-called "overlap", when 
the myriad ionization fronts merged, and the only remaining pock- 
ets of hydrogen with a substantial neutral fraction were dense, self- 
shielded systems, such as Lyman limit systems (LLSs) and damped 
Lyman alpha systems (DLAs). 

Reionization is exceedingly complex, and involves uncer- 
tain, observationally-underconstrained astrophysics. Early inter- 
pretations of observations generally made the simplifying assump- 
tion that reionization is spatially homogeneous. However, an ex- 
tremely inhomogenous reionization see ms to be an un avoidable 
conclusion from recent analytic models l Furlanetto et al. 20( 3), as 
well as large-scale numeric (e.g. lCiardietaT]|2003l : Sokasian et al.l 



ABSTRACT 

It is generally taken for granted that reionization has completed by z — 6, due to the detection 
of flux in the Lya forest at redshifts z < 6. However, since reionization is expected to be 
highly inhomogeneous, much of the spectra pass through just the ionized component of the 
intergalactic medium (IGM) even for non-negligible values of the volume-weighted mean 
neutral hydrogen fraction, 5hi ■ We study the expected signature of an incomplete reionization 
at z^ 5-6, using very large-scale (2 Gpc) seminumeric simulations. We find that ruling out an 
incomplete reionization is difficult at these redshifts since: (1) quasars reside in biased regions 
of the ionization field, with fewer surrounding HI patches than implied by the global mean, 
Shi; this bias extends tens of comoving megaparsecs for xhi < 0.1; (2) absorption from the 
residual neutral hydrogen inside the ionized IGM generally dominates over the absorption 
from the remaining HI regions, allowing them to effectively "hide" among the many dark 
spectral patches; (3) modeling the Lya forest and its redshift evolution even in just the ionized 
IGM is very difficult, and nearly impossible to do a priori. We propose using the fraction of 
pixels which are dark as a simple, nearly model-independent upper limit on xm . Alternately, 
the size distribution of regions with no detectable flux (dark gaps) can be used to place a 
more model dependent constraint. Either way, the current sample of quasars is statistically 
insufficient to constrain 5hi at z ~ 6 to even the 10 per cent level. At z ~ 5, where there are 
more available sightlines and the forest is less dark, constraining xjji < 0.1 might be possible 
given a large dynamic range from very deep spectra and/or the Ly/3 forest. We conclude with 
the caution against over-interpreting the observations. There is currently no direct evidence 
that reionization was complete by z ~ 5 - 6. 
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iGeil & Wvithel2008l : lAlvarez et al]2008l ; lThomas et alj20Qg|) sim- 
ulations. The patchiness of reionization has many important con- 
sequences on the interpretation of observations, prompting calls 
for caution when interpreting Lyq damping wing absorption 
in hig h-z spectra dMesinger & Furlanetto! l2008ah IMcOuinn et all 
2008), redshift evo lution of the Lyman a l pha emitter (LAE) lu- 
minosity functio n ( Furlan etto et all 120061: IMcOuinn et all 120071 : 
iMesinger & Furlanetto 2008b; Il iev et al.ll2008l) , the apparent size 
of the proximity regio n in high- z quasar spectra dLidz et alii 20071 : 
lAlvarez & Abel 2007), and a sharp redshift evolu tion in the ef- 
fectiv e Gunn-Peterson (GP;lGunn & PetersorJll965l) optical depth, 
Tqp, jFurlanetto & Mesingej|2009h . 

Despite its complexity, a hard lower limit for the overlap phase 
of reionization of z ~ 6 is often quoted in the literature. This re- 
sult is justified by the detection of flux in the Lya GP throughs 
of z < 6 quasars (QSOs). Because of the high resonance optical 
depth of the Lya line, the detection of flux implies that the HI frac- 
tion in the interg alactic medium (IGM) is less than ~ 10~ 4 at these 
redshifts (e.g. lFan et al.ll20ol] : lBecker et alj|200ll : Diorgovsk i et all 
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l200ll ; ISongaira & Cowid2(X)3) . Although this number might be ap- 
propriate for the residual HI in the ionized component of the IGM, 
when viewed in the context of a patchy reionization the picture be- 
comes less clear; in fact, the step from "some flux is detected" to 
"overlap has occurred" might require a large leap of (reionization 
model based) faith. 

The interpretation of the QSO Lya forest in this regime is 
complicated for several reasons. Firstly, as reionization nears com- 
pletion, the remaining neutral patches become increasingly rare. 
These neutral patches also have very large line-of-sight (LOS) fluc- 
tuations, which are larger in ID than in the analogous 3D vol- 
ume (c.f. iLidz et al]|2006l for a related treatment of the sightline- 
to-sightline fluctuations arising from just the density field). Large 
sections of the line-of-sight therefore should pass through ionized 
IGM even for non-negligible values of the volume averaged hydro- 
gen neutral fraction, xm < 0.1. 

Further complicating matters is the fact that QSO host ha- 
los are extremely biased, and preferentially reside close to the 
centers of large HII regions, or large scale regions with fewer 
HI pa tches than implied by the global mean neutral fractio n 
(e.g. lLidz et alfcOOlUlvarez & AbefeOOllGeil & Wvithell2008h . 
Thus, one would have to look at spectral regions distant from the 
emission line center to get a representative sample of the ionization 
state of the Universe. 

At these high redshifts (z ~ 5-6), the GP troughs are already 
very dark with large sightline-to-sightline fluctuations even from 
the residual neutral hydrog en in HII regions assuming a uniform 
UV background (UVB ; e.g. lMiralda-Escude et af]|2000h [Fan et al.1 
2006; Lidz et a i] |2006h . Therefore, it is quite challenging to statis- 
tically distinguish in the Lya spectra the additional dark patches 
arising from pre-overlap neutral regions, from the dark patches al- 
ready present due to the resi dual neutral hydro gen, LLSs and DLAs 
in the ionized IGM. Indeed. ILTdz et alj f2007h point out that the fi- 
nal overlap stage of reionization might have occurred at z < 6, 
given these difficulties in interpreting the observed spectra. Here 
we quantify and further explore this scenario. 

Specifically, the purpose of this paper is to quantify how con- 
fident can we be from current Lya forest data that overlap did not 
occur at z < 6, given that reionization is very inhomogenous . For 
this purpose, we make use of the seminumeric simulation, Dexlvfl 
which allows us to efficiently generate density, halo, and ionization 
fields on very large scales and with a large dynamic range. This 
is an essential property of reionization studies, especially those in- 
volving rare, massive QSOs, since one must be able to statistically 
capture the ionization field, resolve M > 10 12 -10 13 M© QSO 
host halos, while including ionizing photons from very small halos 
capable of efficient atomic hydrogen cooling (T v i r > 10 4 K, or 
M > 10 s Mq for the redshifts in question). 

In Sj2] we briefly summarize the large-scale seminumeric sim- 
ulations we use for this study. In £13.11 we show how QSO host 
halos locations are biased with respect to the ionization field. In 
S]3.2| we present simple, general estimates of the excess absorption 
in the Lya forest from incomplete, patchy reionization. In i]3.3l we 
include the geometry of HI patches, and present distributions of 
spectral dark gaps. Finally in Sj4] we summarize our conclusions. 

We quote all quantities in comoving units. We adopt the back- 
ground cosmological parameters (J1a, ^m, n, o~s, Ho) = (0.72, 
0.28, 0.046, 0.96, 0.82, 70 km s" 1 Mpc" 1 ), matching the five-year 
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result s of the WMAP satellite founklev et~al]|2009l ; lKomatsu et ail 
120091) . 

Throughout the paper, we refer to overlap as the "end" of 
reionization. As mentioned earlier, the end of reionization is some- 
what difficult to define, since even after overlap a few percent of 
the mass weighted HI fraction remains in de nse absorption sys- 
tems such as LLSs, DLAs and galaxies (e.g. IProchaska & Wolfe! 
2009). These systems gene rally correspond to overdensities (e.g. 
Miralda-Escude et al. 2000), unlike the HI patches pre-overlap, 
which preferentially reside in voids due to the "inside-out" nature 
of reionization on large s cales. Furtherm ore, these absorption sys- 
tems are fairly small (e . g . I S chavell 200 II) occupying a modest frac- 
tion of the total volume of the Universe. Here we use the volume 
weighted neutral hydrogen fraction, xm, as a proxy for the reion- 
ization status. Thus we define "post-overlap" as the regime where 
xm — » 0, i.e. when the only remaining neutral regions are in dense, 
self-shielded systems. 



2 SEMINUMERICAL SIMULATIONS 

We use the seminumerical code DexM to generate evolved density, 
halo and ionization fields at z = 5 and 6. The halo a nd density 
schemes are presented in Mesinger & Fur lanettol d2007l) , to which 
we refer the reader for details. We use a new algorithm presented 
in Zahn et al. (in preparation) to generate ionization fields on ex- 
tre mely large scales without having to resolve the ionizing sources 
(seelZahn et alj2007l;lMesinger & Furlanettol | 2007l ; lGeil & Wvithel 
|2008| ; I Alvarez et al.l2008l ; lThomas et al.l2009l for alternate seminu- 
meric algorithms to generate 3D ionization fields). Here we briefly 
outline these simulations. 

Our simulation box is 2 Gpc on a side, with the final ioniza- 
tion fields having grid cell sizes of 3.3 Mpc. Halos are filtered out 
of the 1800 3 linear density field using excursion-set theory. Halo 
locations and the initial linear density field are then mapped to 
Eularian co ordinates at a give n redshift using first-order perturba- 
tion theory IZerDovichll 1 97fj|) . This approach is simil ar in spirit to 
the "peak-patch" formalism o f Bond & Mversl dl996h - The result- 
ing halo fields have been shown to match both the mass function 
and statistical clustering pro perties of halos in N-body si mulations, 
well past the linear regime ( Mesinger & Furlanetto 2007, Mesinger 
et al., in preparation). 

We use a new option in DexM which allows us to gen- 
erate ionization fields directly from the evolved density field; 
thus we are able to account for the ionizing contribution of 
halos which are too small to be resolved by our hal o finder. 
We use excursion-set formalism dFurlanetto et al.l [2004) to flag 
fully ionized cells in our box as those which meet the crite- 
ria /coil (x, z, R) > C ' where £ is some efficiency parame- 
ter and / co ii(x, z, R) is the collapse fraction smoothed on de- 
creasing scales, starting from a maximum i? m ax = 50 Mpc 
and going down to the cell size, R ce n. This maximum scale is 
chosen to roughly match the extra polated ionizing photon mean 
free path at these redshifts (e. g. Storrie-Lombardi et al. 1 19941 ; 
lMiral da-Escudej |2003l ; IChoudhurv et~al]|2008h . Howe ver, we use 
the conditional Press-Schechter (PS) formalism fcacev & Cold 
1993; Somerville & Kolatt 1999; specifically we use the "hybrid" 
form of iBarkana & Loen |2004|) on the evolved (i.e. non-linear) 
density field to calculate / co n(x, z, R). Additionally, we allow 
for partially-ionized cells by setting the cell's ionized fraction to 
C/coii(x, z, R cc u) at the last filter step for those cells which are 
not fully ionized. We also account for Poisson fluctuations in the 
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Figure 1. Ionization fields at z = 5, and xjji = 0.38, 0.10, 0.014, left to right. Ionized regions are shown in white. Each slice is 2 Gpc on a side and 3.3 Mpc 
deep. The square in the lower left of each panel frames a 100 Mpc X 100 Mpc region, roughly representative of the maximum size achievable by present-day 
numerical simulations which resolve halos down to the atomic cooling threshold. 



halo number, when the mean collapse fraction becomes small, 
/coii(x, z, -Rcdi) x MceU < 50M m in, where M ce ii is the total mass 
within the cell and our faintest ionizing sources correspond to a 
halo mass of M m i n = 2 x 1O 8 M0. This last step is found to be 
somewhat important when the cell size increases to ~ 1 Mpc. We 
generate ionization fields corresponding to several values of xm 
at each redshift by varying our efficiency parameter. Our ioniza- 
tion field algorithm is described in detail in Zahn et al. (in prepara- 
tion) and found to be in good agreement with cosmological radia- 
tive transfer algorithms, though we caution that tests were not yet 
preformed at the xm < 0.01 level. 

We show slices through our ionization fields in Fig. [T] for 
xm = 0.38, 0.10 and 0.014, left to right. Note that the neutral IGM 
is distributed quite inhomogeneously even at xhi = 0.38, with the 
maximum region of influence, i? ma x = 50 Mpc, damping the in- 
homogeneities on large scales. The 100 Mpc x 100 Mpc square 
in the lower right corner of the slices is roughly representative of 
the maximum size achievable with numerical simulations, if they 
wish to resolve ~ 1O 8 M0 halos which are expected to provide the 
bulk of ionizing photons. The square illustrates the power of the 
seminumerical approach. 

After creating the halo, density and ionization fields in our 2 
Gpc box, we extract randomly-oriented LOSs of length 500 Mpc 
originating from M > 3 X 10 Mq host halos. This mass range 
should roughly correspond to the observed high- z QSO host halo 
masses (Wvithe & Loeb 2003l; ICrotonll2009» , given that their black 
hole masses are estimated to be Afbh ~ 10 8 -10 9 Mq jKurk et all 
120071) . We conservatively extract 500 LOSs per halo, yielding a 
total of ~ 8 x 10 5 (10 7 ) sightlines at z = 6 (5). 

Below we discuss a few simple statistics concerning these 
LOSs. Specifically, we are interested in whether the current 22 
(101) published QSO spectra at 2 > 6 (5) are sufficient to dis- 
tinguish a pre-overlap from a post-overlap IGM. We stress that 



2 Here and throughout the paper, we shall use "conservative" to mean 
closer to the average global ionization state and with less scatter than ex- 
pected. Thus "conservative" choices make the detection of pre-overlap at 
z < 6 easier, in our application. In this case, by generating the same num- 
ber of LOSs for each halo, the typical LOS originates from the less massive, 
less biased halos on the low end of the mass range. 



we do not create detailed mock spectra for these purposes and 
take into account only absorption at the Lya line center. Instead, 
we approach the issue from a more general standpoint and try to 
draw simple conclusions with minimal model dependence. Since 
we are not making precise comparisons to observations, detailed 
mock spectra are both unnecessary and impossible to create given 
the comparatively low resolution of our boxes. Furthermore, ne- 
glected effects such as the damping wing absorption should not 
have a large impact on our conclusions, since the absorption profile 
in the xm <C 1 regime is on av erage weak and steep (c.f. Fig. 8 
in Mesinger & Furlanetto 2008a), and so should not have a large 
impact on many neighboring cells, given their large (3.3 Mpc) size. 



3 RESULTS 

3.1 Bias of QSOs with respect to the ionization field 

In Fig. [2] we show a 2 Gpc x 2 Gpc x 6.7 Mpc slice through our 
simulation box, at (z,xbi) = (6, 0.01). The neutral regions are 
marked in blue, and the quasar host halos are shown as red squares, 
with the length of the square proportional to the log of the halo 
mass. From the picture, one can qualitatively see that larger clus- 
ters of HI regions are preferentially farther away from the host ha- 
los. In our pre-overlap model, such HI patches are generally found 
in large-scale underdense regions or voids, since reionization pro- 
ceeds "inside-out" on large scales. 

We note that we conservatively do not include the ionizing ra- 
diation from the quasars in constructing the ionization field. The 
ionization field is computed directly from the evolved density field, 
using the expected (mean) collapse fraction + Poisson scatter due 
to discrete sources. Since this collapse fraction corresponds to an 
integral over a steep halo mass function, it is dominated by the 
abundance of sources much less massive than QSO host halos. One 
could in principle post-process the ionization fields and "paint-on" 
the QSO contribution around the QSO host halos, but this would 
involve many uncertain assumptions about QSO properties. Not in- 
cluding the QSO ionizing radiation in the ionization fields is a con- 
servative assumption, because it decreases how "over-ionized" or 
biased their environments are. 

We briefly quantify this bias in Fig. [3] which shows the cumu- 
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Figure 2. Slice through our simulation box at z = 6, Shi = 0.01. Partially 
neutral cells are shown in blue; quasar host halos are shown as red squares 
(enlarged for viewing purposes with the length of a side proportional to the 
log of the halo mass). The slice is 6.67 Mpc thick and 2 Gpc on a side. 
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Figure 3. Cumulative probability distributions that a LOS will encounter 
a partially neutral region at distances less than R, originating from M > 
3 X 10 12 M Q halos (solid curves) and random locations (dashed curves). 
The top panel corresponds to z = 6 and the bottom panel corresponds to 
z = 5. Pairs of curves are drawn from simulation boxes with Sjji = 01. 
0.01, and 0.001, top to bottom. 



lative probability distributions that a LOS will encounter a partially 
neutral region at distances less than R. The top panel corresponds to 
z — 6 and the bottom panel corresponds to z = 5. Pairs of curves 
are drawn from simulation boxes with Shi = 0.1, 0.01, and 0.001, 
top to bottom. The solid curves originate from our QSO host halos, 
while the dashed curves originate from random locations. 

There are two important things to note from this figure. Firstly, 
we confirm quantitatively that reionization, especially as probed by 
ID skewers, is extremely patchy. Most of the LOSs for Shi < 0.01 
do not intersect a single neutral region in 500 Mpc. To put this 
into perspective, 500 Mpc corresponds to the entire redshift interval 
from z = 6 to 5. Thus in this regime, a large majority of the Lya 
forest sightline pre-overlap goes through ionized IGM, and is by 
definition indistinguishable from sightlines going through a post- 
overlap IGM. 

Secondly, we see that the likelihood of encountering the first 
neutral patch within the first ~100 Mpc is significantly smaller for 
those sightlines originating at QSO host halos, than those originat- 
ing at random locations in the box. In fact, sightlines originating 
from QSO halos are over twice as likely to entirely go through HII 
regions within the first 60-80 Mpc, for Shi < 0.1. Again, we are 
being conservative in not explicitly including the ionizing radiation 
from the QSOs, which would make nearby neutral patches even 
more unlikely. Thus, Fig.[3]paints a pessimistic picture about being 
able to claim to measure Shi < 0.1 at this epoch. We will show 
the radial dependence of the spectral number density of these HI 
patches in the following section. 

3.2 Excess Absorption in the Pre-Overlap IGM 

Perhaps the simplest, least model-dependent upper limit on Shi can 
be obtained through the covering factor of dark pixels at a given 
redshift. As the Lya s pectra ar e roughly half d ark at z ~ 5 (see be - 
low and figures 4 in lFan et ai]|200 2 and 1 1 in Beck er et al. I l2007h . 
this upper limit is not particularly awe-inspiring. However, it is 
likely that given a large dynamic range, such as might be avail- 
able from very deep spectra and the Ly/3 forest, one might be able 
to place an upper limit of Shi < 0.1 (provided that the Universe 
is indeed so highly ionized). In this section we explore the possi- 
bility of doing better that this simple upper limit. This entails hav- 
ing some a priori knowledge of the Lya absorption in the ionized 
component of the IGM. Although very challenging, one could per- 
haps use the integrated observed UV luminosity function in lieu of 
extracting the UV background directly from the forest with the a 
priori assumption that it is in the post-overlap regime. Such an es- 
timate of the ionizing background inside the ionized IGM compo- 
nent, coupled with a prescription for estimating the residual neutral 
hydrogen distribution [e.g. radiativ e equilibrium and/or a simp le 
density threshold for self-shielding (Miral da"-Escude et alj |2000)1. 
can yield an independent estimate of the absorption inside the ion- 
ized IGM, which one can then apply to observed spectra in an effort 
to detect the excess absorption expected in a pre-overlap IGM. Un- 
fortunately, astrophysical uncertainties are large at the present time, 
and this procedure is unlikely to yield strong constraints. 

As explained in the introduction, detecting pre-overlap is more 
difficult than merely having enough LOSs to find one of the neutral 
patches seen in figures [2] and [3] One must be able to statistically 
distinguish such a dark region in the spectra from those already 
resulting from the residual HI in the ionized component of the IGM. 
As seen in the previous figures, a large majority of the LOS will go 
through ionized IGM, for these global neutral fractions, making 
them identical to the post-overlap spectra. We will now proceed to 
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within the ionized component of the IGM) probability of encoun- 
tering a dark spectral region with optical depth greater than r max . 
For simplicity of presentation, we ignore the radial dependence of 
^Hn k ( T max, z), which should in general be much weaker than the 
radial dependence of the ionization field and P{^f Tk (< R, xm, z). 
For our regime of interest (xm « 1) and neglecting spatial cor- 
relations, we can approximate the total probability of encountering 
a dark pixel at a distance R away from a QSO as: 



(< R,xm,z,T n 



rjdark . rjdark 
J~HI + "HII " 



r>dark rjdark 
J HI ~HII 



(i) 



Note that these probabilities are different than the ones plotted 
in Fig. [3] where we had plotted the likelihood of encountering the 
first neutral region at distances less than 7?. As defined, P^f rk and 
Pnn k are analogous to the covering fraction of dark regions, i.e. 
the fraction of pixels at fixed z which are dark. 

Therefore, the additional likelihood of finding a dark spectral 
region in a pre-overlap universe compared to a post-overlap uni- 
verse is: 



50 100 
R (Mpc) 



150 



AP dark (< J R,i H i, 



(2) 



Figure 4. The excess probability of finding a dark spectral region pre- 
overlap (compared to post-overlap) at distance R away from a quasar, at 
2 = 6 (top panel) and 2 = 5 (bottom panel). Dotted curves correspond 



As R — > 00 and Phii (imm, 2) — > 0, the likelihood of encoun- 
tering a dark region roughly approaches the global neutral fraction, 
xm- 

In Fig. [4] we plot this excess probability at 2 = 6 (top 
panel) and 2 = 5 (bottom panel). Solid curves correspond to 



to a post-overlap probability of P^ff k (r max , 2) = 0.6 (top panel) and Pnn (r max , 2) = 0.9 (top panel) and ££11 (w, «) = 0.4 (bot- 



5 (T ma x,2) = 0.02 (bottom panel) of encountering a dark region, 
roughly corresponding to the mean value in the Ly/3 forest. Solid curves cor- 
respond to P^ff k (r max , 2) = 0.9 (top panel) and P^ff k (r max , z) = 0.4 
(bottom panel), which is roughly equivalent to a maximum Lyce optical 
depth of T max = 3.5 (see text). Pairs of curves are drawn from simula- 
tion boxes with xjji = 0.1, 0.01, and 0.001, top to bottom. 



do some simple estimates on the amount of excess absorption in 
Lya spectra due to the neutral regions in the pre-overlap regime. 

Because we don't know redshift evolution of xm, the UVB, 
absorption systems, temperature, etc., the safest route is to iso- 
late the spectral imprint of the pre-overlap HI regions by analyzing 
spectra at a fixed redshift, instead of binning data in wide redshift 
bins. We first focus on the number of "dark" spectral regions in 
the spectra pre and post overlap. We note that the minimum sizes 
of dark spectral regions are resolution dependent. The resolution 
of our simulation is 3.3 Mpc, which is approximately three times 
the resolution of the Keck ESI spectra at these redshifts, when 
binned to R=2500 in order to inc rease the dynamical range (e.g. 
I White et al.ll2003l ; iFan et alj[200r3) . Thus, our resolution is some- 
what larger than the minimum size of a statistically significant de- 
tection of a "dark region" (r > 4) with present-day instruments. 
Note that very deep, high resolution spectra were taken for a few 
(~ 9) of these 2 > 5 objects with the Keck HIRES instrument 
(Becker et al. 2007). These can also be binned to lower resolution 
in order to increase the dynamical range. Alternately, one can sacri- 
fice some of the additional dynamical range obtained through bin- 
ning in order to resolve narrow transmission spikes, which might 
yield more discriminating dark gap distributions than those dis- 
cussed in $3.3\ 

We define Pnf rk (< R,x m ,z) to be the probability of en- 
countering a neutral region (i.e. not within the ionized component 
of the IGM) at a distance between R and R+dR away from a QSO. 



Similarly, we define Phu(t„ 



to be the post-overlap (i.e. 



torn panel). These were com puted from the 150 Mpc simulations in 
iMesinger & Furlanettd ( 2009), using a minimum source halo mass 
of M m i n = 1.6 x 10 8 Mq and ionizing photon mean free path 
of 40 Mpc, These values serve as a rough estimate of the frac- 
tion of the Lya spectra with absorption greater than the threshold 
T max = 3.5, when the UVB is adjusted to match the obse rved value 
of Tcf p = - ln(cxp[-r]) G p = 7.1 (2.1) at 2 ~ 6 (5) dFan et all 

2006) . The dotted curves correspond to a post-overlap probability 
of p£fI k (Y max , 2) = 0.6 (top panel) and P^ rk (r max , 2) = 0.02 
(bottom panel), corresponding to an increased dynamical range, 
r max = 11. This value is (very) roughly the mean detection limit 
for the Ly/3 forest for the Keck ESI. Again, the quoted values of 
^Hii k ( T max, 2) serve as a guide and should not be taken too seri- 
ously as we do not create detailed mock spectra, though they do 
seem to be in decent agree ment with observati ons (c.f. Fig. 4 in 
IFan et alj|2002l and Fig. 1 1 in lBecker et afll2007l) . 

Fig. E] further illustrates the biased ionization structure sur- 
rounding the QSO host halos. Even without including QSO ioniz- 
ing radiation, photons must travel tens of megaparsecs to get to a 
representative sample of the IGM. This distance becomes larger for 
smaller values of xm - The scale of this bias can even exceed the lo- 
cal QSO region of influence. In other words, the galaxies surround- 
ing the quasar can self-ionize (or pre-ionize if they turn on before 
the QSO as is likely) the surrounding IGM to scal es larger than the 
QSO 's proximity zone, for these values of xm (c.f. lAlvarez & Abell 

2007) . We plan to explore this effect further in a subsequent pa- 
per, in conjunction with hydrodynamical simulations. Note how- 
ever that using this number density statistic, the bias is not as ex- 
treme as with the "first HI patch" statistic presented in Fig. [3] 

The asymptotic value of AP dark (< R, xm, z , r max ) ~ 
xm(l - Pmi ) is approached at R ~ 30-60 Mpc. Strictly speak- 
ing, AP dark (< R, xm, z, r max ) can exceed this value somewhat, 
since most HI regions pre-overlap on this scale are partially ionized 
to few x 10%. As the dynamic range probed by the Lyman lines is so 
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Figure 5. The number of ~ 3 Mpc pixels, iV p ; xe i s , required to separate 
the pre-overlap and post-overlap expectation values of the number of dark 
regions by 2o\ In other words, Np^^ is the minimum value such that 
an observed number of dark pixels cannot be consistent with both a pre- 
overlap and post-overlap IGM, to within 2a. Curves correspond to the same 
models discussed in Fig. [4] iVpi xe i s scales as n 2 , where n is the number of 
standard deviations separating the two distributions. The lower horizontal 
short-dashed lines demarcate the present number of published QSO spectra 
at > z. The upper horizontal short-dashed lines demarcate the total number 
of ~ 3 Mpc pixels/segments available from published quasar spectra at 
> z. The horizontal long-dashed lines show this total for the Ly/3 forest. 

narrow, partially ionized regions pre-overlap are counted the same 
as fully neutral regions in these statistics. 

Before overlap, the expectation value of the number of dark 
pixels at a given redshift from a total of iVpixeis samples is 
AWds.Pt d ot rk - Again, P t d t rk is the likelihood a pixel will be dark 
either from an HI region or from a dark region within the ionized 
IGM. After overlap, when the entire IGM is ionized, the expecta- 
tion value of the number of dark pixels is smaller: ATpixels-fHn • 
The PDFs of these two distributions should be binomial with these 
mean values. Therefore, in order to separate the expectation values 
pre and post overlap by n standard deviations, a, we require: 



Ar 7->dark * r j-jdark . 

Apixcls^tot — nfJtot > JVpixels-fHII + WHII 



(3) 



In other words, iVpixeis is the minimum value such that an observed 
number of dark pixels cannot be consistent with both a pre-overlap 
and post-overlap IGM, to within no. 

In Fig. [5] we plot the quantity A/pixels for n — 2, for the same 
models shown in Fig. [4] The lower horizontal short-dashed lines de- 
marcate the present number of published QSO spectra at > z. This 
limit corresponds to the least model-dependent constraint possible 
on xm, as it doesn't involve binning data in wide redshift bins or 
modeling the redshift evolution of these parameters. 

If one is allowed to make assumptions about the redshift evo- 
lution of xm, and the absorption in the ionized IGM, then one can 
use wider spectral regions, approaching the Lya line and the in- 
trinsic source redshift. This is the most information we can hope 



to squeeze out from the observed sample of quasar spectra, us- 
ing this simple statistic. We show this limit as the upper horizontal 
short-dashed lines, demarcating the total number of ~ 3 Mpc pix- 
els/segments in published Lya forest spectra available for analysis 
from quasars at > z, The horizontal long-dashed lines show this 
total for the Ly/3 forest. Our horizontal lines are also conservative 
limits in that we count spectral regions all the way to the quasar 
redshift. In reality, the biasing seen in the previous figures from the 
surrounding ionization field, a well as from the QSO's ionizing ra- 
diation will make the region within R < 50 Mpc unusable for this 
analysis. Requiring that the quasar is at redshifts greater than z + 50 
Mpc, eliminates roughly half (quarter) of the total sample at z = 6 
(5), though the total LOS length is only weakly affected. 

Furthermore, this analysis is extremely conservative in that it 
assumes perfect a priori knowledge of the absorption caused by the 
ionized component of the IGM, i.e. Phii ■ Even with such unre- 
alistic confidence in our ability to model the UVB, density, and 
temperature distributions in the ionized we see from Fig. [5] 

that many LOSs are required to constrain the additional absorption 
caused by xm < 0.1. In practice using this covering fraction statis- 
tic without a priori knowledge of the absorption inside the ionized 
IGM, means that a pre-overlap IGM is degenerate with a "darker" 
post-overlap IGM (e.g. with a weaker UVB). This means that ob- 
taining constraints on aim is much more difficult than implied by 
Fig.S 

At z = 6, there is no hope of using the current sample of 22 
2 > 6 QSOs to constrain Shi to the ~ 10% level, using the mean 
excess absorption in the Lya or Ly/3 forest. At z — 5, one could 
marginally constrain xm < 0.1, from the available data, but only 
in a model-dependent way. 

The Ly/3 forest probes a larger dynamic range, and so can bet- 
ter discriminate between the residual HI in the ionized IGM and the 
more highly neutral patches pre-overlap (e.g. lMesinger & Haimanl 
2004). Thus the dotted curves in Fig.[5]can be significantly lower 
than the solid curves. Fig. [5] suggests that the Ly/3 forest can be 
used to potentially constrain xm < 0.1 at z = 5. 

The precise interpretation of the Ly/3 forest is hampered by 
the presence of lower redshift Lya forest absorptio n, contributing 
fluctu ations in the transmitted flux of order ~ 20% jDiikstra et al.l 
2004). Furthermore, the Ly/3 forest (with its red edge correspond- 
ing to the onset of the Ly7 forest) is narrower than the Lya forest. 
Thus the horizontal long-dashed curves in Fig.[5]are lower than the 
short-dashed ones. Therefore detailed simulated spectra should be 
employed when comparing with observations to further study the 
utility of the Ly/3 forest. Nevertheless, we stress that using the cov- 
ering fraction of dark pixels in the Ly/3 forest likely provides the 
simplest, least-model dependent upper limit on the value ofxm- 

Finally, we stress that the main conclusions presented in 
this section are unaffected by the precise geometry of the neutral 
patches. In the 7? — > oo asymptotic limit, the covering fraction of 
the HI patches roughly approaches xm, and so is not affected by 
the possible inaccuracies of our ionization field models. 



3 This is challenging not only since it is model-dependent, but also since 
the best estimate of the UVB and temperature comes from the Lya forest, 
with the a priory assumption that it is in the post-overlap regime. A Lya 
forest in a pre-overlap IGM would bias the derived quantities. 
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3.3 Dark Gap Distribution 

Thus far we have ignored geometric information imprinted by the 
pre-overlap absorption features. The sizes of regions with no de- 
tectable flux ("dark gaps") can potentially provide useful addi- 
tional information about the state of the IGM (e.g. ICrofq 1 19981 ; 
ISongaila & Cowiejl2002l ; rBarkanj|2002l) . The simplest approach in 
dark gap s tudies is to asso ciate pre-overlap with larger spectral dark 
gaps (e.g. lFan et al hood and references therein). In practice how- 
ever, obtai ning constraints is dif ficult even under idealized assump- 
tions (e.g. lGallerani et alj20o3) . 

In order to gain physical insight into the use of this approach 
and the robustness of its application, one should be able to disen- 
tangle redshift, Shi (which depends on the large-scale ionization 
history), and the absorption inside HII regions (which in turn de- 
pends on the UVB, density and temperature fields). Numerical sim- 
ulations generally relate all of these quantities via some star for- 
mation prescription, resulting in a single realization of a possible 
reionization history. This lack of versatility makes it difficult to ro- 
bustly isolate the imprint of patchy reionization from mock spectra. 
Further complicating matters are the large parameter space to ex- 
plore and the enormous dynamic range, as simulations should sta- 
tistically sample the large-scale environments of QSOs (with halo 
masses of ~ 1O 13 M0), while resolving the small mass halos domi- 
nating the ionizing photon budget (with halo masses of ~ 1O 8 M0). 
Therefore it is quite useful to extend the available dynamic range 
through seminumeri cal simulations su ch as ours, or sub-grid nu- 
merical simulations dKohler et ah 2007). 

Here we take a more general approach, by separately study- 
ing the dark gaps from HI patches and those from residual neutral 
hydrogen in HII regions. We vary vary z, aim, and the mean GP 
absorption from the ionized IGM, parametrized by its tqp- Our 
poor resolution prevents us from probing the dark gap distribution 
on small scales, but present technology makes these scales difficult 
to observe anyway, si nce spectra need to be binned to increase the 
dynamical range (c.f. lWhite et"aT1l2003b . 

In Fig. [6] we plot the number of dark gaps per unit gap length 
per comoving Mpc of the LOS, at z = 6 (top) and z — 5 (bottom). 
Solid and dashed curves are extracted from t he post-overlap ionized 
IGM i n the seminumerical simulations of Mesinger & Furlanettol 
(2009) assuming r max = 3.5. The dashed curve corresponds to 
tqp = 4, and solid curves correspond to the observed^] values of 
r^p = 7.1 (top panel) and tqp =2.1 (bottom panel) JFan et al.l 



t = 3.5 

max 



120061) . Dotted curves show the contribution from neutral IGM 
patches pre-overlap, and correspond to xhi = 0.1, 0.01, and 0.001, 
right to left. The total pre-overlap dark gap PDF is approximately 
the sum of the distributions in the ionized and neutral IGM. 

Interestingly, Fig.[6]shows that the dark gap PDF arising from 
the neutral patches pre-overlap is insensitive to redshift over this 
range. The absolute normalizations scale with khi, but the shapes 
of these PDFs are fairly constant. 

The dark gap distribution from the ionized IGM has a more 
complex behaviour. We see that as the Lya forest in the ionized 
IGM gets darker, the distribution of dark gaps flattens. The ob- 
served (going through ionized and potentially also neutral IGM) 
GP effective mean optica l depth evolves f rom tqp = 2.1 at z ~ 5 
to Top = 7.1 at 2 ~ 6 dFan et alj|2003) . These values set upper 
limits to the amount of absorption coming from just the ionized 
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Figure 6. The number of dark gaps per unit gap length per spectral Mpc, 
at 2 = 6 (top) and 2 = 5 (bottom). Solid and dashed curves are ex- 
tracted from the post-overlap ionized IGM in the seminumerical simula- 
tions of Mesineer & Furlanetto 1 2009) assuming r m ax = 3.5. The dashed 
curve corresponds to an effective GP optical depth through entirely ionized 
IGM of T-Qp = 4, and solid curves correspond to the observed values of 
Tq^> = 7.1 (top panel) and TQp = 2.1 (bottom panel). Dotted curves show 
the contribution from neutral IGM patches pre-overlap, and correspond to 
^HI = 01> 0.01, and 0.001, right to left. The total pre-overlap dark gap 
PDF is approximately the sum of the distributions in the ionized and neu- 
tral IGM. 
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4 Note that here we define rg* only with respect to the ionized IGM. The Figure ?> Analogous to Fig.© but with r max = 11. 

"true" TQp in the hya forest pre-overlap can have a contribution from ab- 
sorption in HI regions. 
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Figure 8. Analogous to the bottom panel of Fig. [7] but including some 
additional models of the ionization field. The dotted curves show our fidu- 
cial model with M min = 2 X 10 s M & and -Rmax = 50 Mpc, with 
the added blue curve corresponding to Shi = 0.38. There are two ad- 
ditional distributions for xm = 0.1: the dashed curve corresponds to 
M min = 5 X 1O 9 M and R max — 50 Mpc and the dot-dashed curve 
corresponds to M min = 2 X 10 8 Af Q and -R max = 15 Mpc. 

phase of the IGM. Hence, the true dark gap distribution in the ion- 
ized IGM should be similar to or steeper than the solid curves in 

Fig.m 

At z = 5 the dark gap PDF from the ionized IGM is steeper 
than the PDF from the neutral IGM. In this regime, a high- value tail 
in an observed dark gap distribution would indeed be an indication 
of incomplete reionization. At z = 6 however, the situation is more 
ambiguous. Here the dark gap distribution from the ionized IGM 
can be flatter than the distribution from the neutral IGM. In this 
regime, a high- value tail in observed spectra could result from ei- 
ther the ionized component or the neutral component. Throughout 
this redshift range, a signature of pre-overlap seems to be a "knee" 
in the total observed dark gap distribution, as the PDFs from the 
ionized and neutral components are bound to have different slopes 
within 5 < z < 6. 

Therefore at first glance, Fig. [6] hints at two signatures of in- 
complete reionization: (1) the presence of a high- value (/dark > 
tens of megaparsecs) tail in the observed dark gap distribution at 
2 < 5; and (2) a redshift-independent signature of a knee in the 
dark-gap distribution. Do we have enough data to statistically ex- 
tract these signatures? 

At 2 > 6, there are a total of 22 published QSO spectra, with 
a total comoving path length of ~ 1200 Mpc. Looking at the top 
panel of Fig. [6] this is likely one to two orders of magnitude too 
small to create an accurate dark gap PDF. Furthermore, most QSOs 
are at redshifts close to z ~ 6, thus incapable of probing the high- 
value tail, even if their near zones and ionization field bias could be 
included in the analysis. The dark gap PDF from just the ionized 
IGM is already so flat that large gaps on cosmological scales are 
likely. Therefore redshift evolution must be taken into account (see 
below). 

As 2 > 5, there are a total of 101 published QSOs, with a total 
of ~ 26 Gpc of combined spectral length of the Lya forest. Even 
excising the closest ~ 50 Mpc to each quasar yields > 20 Gpc with 
which to probe the dark gap PDF. Comparing to the bottom panel 
of Fig. [6] this seems like a decently large number, allowing one to 
constrain Shi < 0.1 from the dark gap PDF. 

However, in order to take advantage of the available spectral 
path lengths, one must average over the wide redshift intervals these 



spectra span. The resulting total dark gap PDF will absorb the un- 
certain redshift evolution of both the dotted and solid curves in Fig. 
[6] As LOS skewers piercing the ionized IGM get rapidly darker at 
2 > 5, their corresponding contribution to the total dark gap PDF 
will get rapidly flatter with redshift, even over fairly small redshift 
bins. The resulting redshift-averaged PDFs will likely smear out 
any of the signatures of incomplete reionization discussed above. 
Thus to get constraints on xhi even at the 10% level, one would 
again need to rely heavily on our ability to accurately model the 
absorption from residual HI in the ionized IGM, and its redshift 
evolution. 

The situation improves with a larger dynamic range, shown in 
Fig. |7] where we plot the same quantities, but taking r max = 11. 
This value of r max roughly corresponds to an optimistic predic- 
tion for the Ly/3 forest, provided that the lower redshift Lya for- 
est can be statistically extracted. We see that the dark gap distri- 
butions from an ionized IGM are steeper, thereby increasing the 
confidence at which one can associate a long dark gap with incom- 
plete reionization at 2 ~ 5. An increased dynamic range might 
be available with new instruments such as the James Webb Space 
Telescope (JWST) and the ten meter telescope (TMT), or through 
long-integration w ith existing instruments such as Keck HIRES 
teeckeretalll2007h . or with higher-order Lyman lines, such as 
Ly/3. Although an extended dynamical range is more readily avail- 
able through the Ly/3 forest in existing spectra, one must be care- 
ful in modeling the fluctuating absorption from the superimposed 
lower redshift Lya forest. 

In Fig. [8] we plot several additional dark gap distributions at 
2 = 5, varying two parameters of our ionization models. The dot- 
ted curves show our fiducial model with Mmin — 2 x 10 8 A/q 
and -R m ax = 50 Mpc, with the added blue curve corresponding to 
xhi = 0.38. We see that this curve is noticeably flatter than the 
curves corresponding to lower neutral fractions. There are two ad- 
ditional distributions for xhi = 0.1: the dashed curve corresponds 
to Mmin = 5 x 10 9 Mq (as might be the case if feedback is very 
strong) and -Rmax = 50 Mpc while the dot-dashed curve corre- 
sponds to M m i n = 2 x 10 8 Af Q and .Rmax = 15 Mpc [as might be 
the case if r ecombinations act as ef ficient photon sinks in ionized 
regions (e.g. iFurlanetto & Ohll2005l) l. We see that the scenarios in 
which the sources are more homogeneous (lower Mmin) and more 
localized (lower -Rmax), result in a more uniform (i.e. steeper) dis- 
tribution of dark gaps. However, the differences in the curves at 
fixed xhi are relatively small, especially when compared to the dis- 
tributions of dark gaps from the ionized IGM. Therefore, we do not 
expect our main conclusions in this section to be very sensitive to 
the uncertainties in these parameters. 



4 DISCUSSION 

Much controversy has surrounded various claims concerning the 
ionization state of the Universe at 2 > 6, as probed by the spectra of 
high redshift quasars. Ironically, the same arguments used to claim 
that the spectra of these quasars are consistent with reionization 
completing by 2 >> 6 can also be used to claim the spectra are 
consistent with reionization not completing by 2 < 6. 

In this work, we re-examine the often-quoted evidence of 
xhi < 10~ 4 -10~ 3 from the detection of flux in the spectra of 
2 < 6 QSOs. Because reionization is very patchy, and the remain- 
ing neutral patches become very rare as reionization progresses, 
it is difficult to claim reionization was complete by this redshift 
dLidz et alj2"007l) . 
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To quantify these difficulties, we take advantage of the large 
dynamic range provided by the seminumerical modeling tool, 
DexM. Our simulation boxes are 2 Gpc on a side, resolve the likely 
host halos of QSOs, and generate ionization fields accounting for 
ionizing photons from halos with virial temperatures of ~ 10 4 K. 
We extract ~ 10 6 (10 7 ) sightlines through the ionization field at 
z = 6(5) originating from identified QSO host halos. However, we 
do not make detailed simulated spectra for this study. 

We confirm that reionization is very inhomogeneous. In fact, 
most sightlines through a xm < 0.01 universe do not intersect a 
single neutral region in a 500 Mpc stretch (corresponding to the 
redshift interval from z = 6 to z = 5. Furthermore, QSO host 
halos reside in very biased locations of the ionization field, where 
neutral patches are even rarer. Even without explicitly including the 
quasar's own ionizing radiation, sightlines originating from QSO 
halos are over twice as likely than those from random locations to 
entirely go through the ionized IGM within the first 60-80 Mpc, 
for Shi < 0.1. As measured by the number density of HI patches, 
the bias is less extreme, and the asymptotic values are approached 
at distances of ~ 30-60 Mpc away from the quasar. 

Even before overlap, the absorption in the Lya forest is dom- 
inated by the residual neutral hydrogen inside the ionized IGM 
at these redshifts. Detecting the additional dark spectral regions 
caused by HI patches pre-overlap is very challenging, and cannot 
be done from the existing Lya forest data at z ~ 6. At z ~ 5, where 
there are more available sightlines and the forest is less dark, con- 
straining xhi < 0.1 might be possible given a large dynamic range 
from very deep spectra and/or the Ly/3 forest. 

To further quantify the imprint of an incomplete reionization, 
we also create dark gap distributions. In order to be systematic, we 
separately examine the dark gap PDFs arising from the residual 
hydrogen in the ionized IGM, and those from just the HI patches 
pre-overlap. We find that these two PDFs are likely to have different 
slopes, which can result in a "knee" feature in the combined PDF. 
At z = 5, large dark gaps preferentially come from the HI patches 
pre-overlap; however, at z = 6, large dark gaps could instead pref- 
erentially come from the ionized IGM. Unfortunately, averaging 
the PDFs over redshift bins might smear out these signals, since 
the gap distribution from the ionized IGM has a strong redshift de- 
pendence. 

A larger dynamical range would aid in the discriminating 
power of the current sample of QSO spectra, in both the mean ab- 
sorption statistic and the dark gap distributions. An increased dy- 
namic range might be available with new instruments such as the 
JWST and the TMT, or through l ong-integration wit h existing in- 
struments such as Keck HIRES (Bec ker et alj|2007h and ESI, or 
with higher-order Lyman lines, such as Ly/3. In the case of the later, 
one must be careful in modeling the fluctuating absorption from the 
superimposed lower redshift Lya forest. 

We propose using the fraction of pixels which are dark as a 
simple, nearly model-independent upper limit on xm- This simple 
statistic might be sufficient to constrain xm < 0.1 at z ~ 5 using 
deep spectra of the current sample of z > 5 quasars. 

In conclusion, there is currently no direct evidence that reion- 
ization was complete by z ~ 5 — 6. We stress that we are not 
promoting such a late reionization scenario. This paper is merely 
intended as a caution against over-interpreting high-redshift obser- 
vations. 
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